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1 Description 
2 

3 Method for managing and monitoring the operation of a 

4 plurality of distributed hardware and/or software systems that 

5 are integrated into at least one communications network, and 

6 system for carrying out the method 
7 

8 The invention relates to a method for managing and monitoring 

9 the operation of a plurality of distributed hardware and/or 

10 software systems that are integrated into at least one 

11 communications network. 
12 

13 For reasons of cost and efficiency, more and more distributed 

14 hardware and/or software systems have recently been used in 

15 the business sector, in particular. Such systems can be 

16 operated in a virtual environment using the possibilities of 

17 "adaptive computing" in which, in a development of 

18 conventional systems, adaptation to the requirements of the 

19 current application is also possible in the hardware. Software 

20 systems which are becoming ever more complex are being 

21 operated in an increasingly heterogeneous hardware world. The 

22 assignment between software entities and hardware resources is 

23 no longer fixed but varies dynamically depending on the 

24 current requirements. 
25 

26 It is not possible to manage and monitor such distributed 

27 hardware environments using the conventional tools and 

28 monitoring tools which presuppose a fixed assignment between 

29 hardware and software. On account of the continuous dynamic 

30 configuration changes in the systems, which result, for 

31 example, from the self-healing mechanisms implemented by the 

32 system, the administrator's purely manual way of working is 

33 hardly practical any more. 
34 

35 Therefore, the invention is based on the object of specifying 

36 an improved method for managing and monitoring the operation 
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1 of a plurality of distributed hardware and/or software 

2 systems. 
3 

4 In order to achieve this object, a method of the type 

5 mentioned initially provides, according to the invention, for 

6 a central program means that is stored in a data processing 

7 device to process system-related data which are present in the 

8 data processing device or are received by the latter via a 

9 communications network, to autonomously derive operation- 

10 related decisions from said data and, on the basis of said 

11 decisions, to generate decision-specific control data for 

12 influencing the operation of one or more hardware and/or 

13 software systems and to transmit said control data, via the 

14 communications network, to data processing devices which are 

15 assigned to the respective hardware and/or software systems. 
16 

17 The central program means is thus capable of automatically and 

18 autonomously carrying out essential management, administration 

19 and monitoring tasks. It combines capabilities and functions 

20 which can nowadays be furnished only in part by administrators 

21 and system management and monitoring tools and which have 

22 hitherto not been able to be sufficiently furnished in the 

23 field of "adaptive computing". An important basis of the 

24 method according to the invention is the decision-making 

25 component of the central autonomous program means. Control 

26 data are generated on the basis of the decisions made in this 

27 manner and are forwarded to the individual systems which, for 

28 example, stop a hardware and/or software system or move a 

29 particular application. The control data are transmitted, via 

30 the communications network, to the individual systems which 

31 are affected by the respective decisions. In this manner, in 

32 the method according to the invention, the central program 

33 means undertakes numerous tasks which, in conventional 

34 hardware and software environments, are manually undertaken by 

35 administrators. 
36 
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1 One development of the concept of the invention provides for 

2 the central program means to access rule data, which comprise, 

3 in particular, rules regarding priorities and/or sequences 

4 and/or logical and/or temporal relationships, and/or 

5 performance data, which relate, in particular, to the current 

6 operational load and/or the temporally restricted and/or 

7 dynamic and/or periodically needed capacity requirement, 

8 and/or grouping data and/or classification data and/or 

9 availability data, said data being stored in the data 

10 processing device. The rule data form a rule system which 

11 prescribes a basic structure for the management or 

12 administration and monitoring method. Priority rules may 

13 define, for example, the preference for the interactive mode 

14 over batch processing in an application entity. Sequences may 

15 determine which services have to be stopped first in the event 

16 of a stoppage. System components possibly have to resort to 

17 other systems or results provided by other system components. 

18 In such cases, it is necessary to take into account a number 

19 of logical and/or temporal relationships that the method 

20 obtains from the rule data. A software system requires 

21 sufficient hardware resources. In order to determine the 

22 capacities required and the regular operational load on the 

23 hardware systems, the performance data can again be accessed 

24 in the method according to the invention. Performance data 

25 relate, for example, to the current operational load or the 

26 capacity regularly required by an application that runs at 

27 certain intervals of time, for example. Said data provide a 

28 measure of the performance of the system environment. For 

29 effective management, it is also expedient to divide the 

30 system environment, together with its components and the tasks 

31 to be carried out by it, into different groups or classes. The 

32 associated grouping and classification data may 

33 correspondingly relate to structural aspects (for example in 

34 the case of identical hardware) and aspects as regards 

35 contents (for example in the case of components which interact 

36 in order to solve a problem) . In addition, the method accesses 

37 data relating to the availability of individual systems. For 
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1 example, the method thus determines whether and where the 

2 resources, for example CPUs or main memories, needed for an 

3 application that is running according to plan are available. 
4 

5 In addition, the invention provides for the system-related 

6 data to be operating plans, which regulate, in particular, run 

7 times and availability of individual hardware and/or software 

8 systems, and/or information regarding the operating state of 

9 individual systems, said information relating, in particular, 

10 to the current and/or future and/or periodic workload, and/or 

11 an operator's wishes which have been input at the central 

12 and/or individual system level using an input device. In 

13 contrast to the data mentioned in the preceding section, these 

14 system-related data are of a less general nature but rather 

15 relate more to the current operation of the systems. In this 

16 case, the central program means receives, for example, data 

17 regarding the fact that an application which accesses a. 

18 database that is currently greatly burdened is currently 

19 running. If there is then a fault in an application entity and 

20 in a database entity required by the latter, the central 

21 program means can use these system-related data to access the 

22 rule data, which comprise, for example, the fact that, in such 

23 a case, the fault in the database entity must be rectified 

24 first. In this case, it is necessary to take into account 

25 operator wishes, which a user can input at the central and/or 

26 individual system level using an input device, in order to 

27 ensure ease of operation and to enable variable operation. 
28 

29 The central data processing device expediently receives the 

30 information regarding the operating state of individual 

31 systems in an active and/or passive manner. The task of 

32 receiving and collecting the information can thus be adapted 

33 depending on the conditions of the system environment. For 

34 example, it may be advantageous for the central data 

35 processing device to be provided, as standard, with routine 

36 data associated with normal operation, while it independently 
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1 actively requests special data in the case of faults or 

2 reconfiguration problems, for example. 
3 

4 The invention provides for the information to relate to 

5 hardware in the form of clients and/or servers and/or networks 

6 and/or storage systems and/or software in the form of 

7 applications and/or distributed applications having services 

8 that are dependent on one another and/or distributed 

9 application systems having virtualized services that are 

10 dependent on one another and/or are independent of one another 

11 and/or databases and/or front ends. More or less system- 

12 related information regarding the hardware and software is 

13 required depending on the design of the underlying system 

14 environment. Server/client networks and storage units or 

15 storage systems are given an outstanding role in connected 

16 system environments. Databases are usually accessed from a 

17 plurality of systems, so that the information relating to the 

18 latter should be centrally available. The same applies to 

19 distributed application systems, in particular in the field of 

20 "adaptive computing", since in this case configuration changes 

21 have to be centrally monitored. 
22 

23 Provision is expediently made for the control data which are 

24 generated by the central program means to control the. starting 

25 and/or stopping and/or addition of services and/or the 

26 movement of services and/or applications and/or the 

27 maintenance of a distributed hardware and/or software system. 

28 In this manner, the central program means causes an 

29 application to be started or a hardware system to be stopped, 

30 for example. Individual services, for example interactive 

31 mode, batch processing, accounting, printing, messaging or a 

32 web service, can be added or, if they are no longer needed 

33 again or are needed again only after a particular period of 

34 time has elapsed, can be moved. Applications which are 

35 currently not required can similarly be moved. Maintenance, 

36 for example when installing and updating applications, can be 

37 centrally controlled in an analogous manner. Applications can 
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1 thus be installed autonomously and centrally on the basis of 

2 the acknowledgments which are received in the individual . 

3 updating and installation steps. .If an application environment 

4 is to be stopped again, the decision-specific control data are 

5 based, as when starting, on a sequence and alternative 

6 routines are heeded. It is also possible to reconfigure a 

7 software system, for example, in a similar manner. 
8 

9 One refinement of the invention provides for the operation- 

10 related decisions to comprise the determination of 

11 administrative tasks and/or chains of tasks. A task may be, 

12 for example, the monitoring of a particular system. Chains of 

13 tasks comprise tasks that are to be executed in a particular ■ 

14 order, for example the coordinated stopping of a plurality of 

15 systems. 
16 

17 Provision is also made for the central program means to 

18 autonomously separate administrative tasks and/or chains of 

19 tasks into subtasks taking into account logical and/or 

20 temporal relationships and/or dynamic influences and/or 

21 availability data and/or priorities and/or grouping data 

22 and/or classification data and/or application data which are 

23 present in the data processing device, in particular for the 

24 purpose of moving and/or replacing application entities. If, 

25 for example, it is necessary to reconfigure a system 

26 environment, a chain of a large number of tasks needs to be 

27 executed for this purpose. An application whose functionality 

28 is based on a database can only be operated again after the 

29 database on account of the logical relationship. Temporal 

30 relationships exist if, for example, it is necessary to resort 

31 to earlier results. In addition, it may be expedient to only 

32 operate system entities of a particular class again in order 

33 to establish a basic functionality, for example. In this case, 

34 separation into subtasks makes it possible to execute chains 

35 of tasks in a locally distributed manner and to take into 

36 account temporal conditions. 
37 
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1 It is also advantageous if the central program means checks 

2 the temporal progression of the administrative tasks and/or 

3 chains of tasks, which are transmitted to the individual 

4 hardware and/or software systems in the form of control data, 

5 continuously and/or at particular intervals of time. In this 

6 manner, faults and problems which possibly arise are 

7 discovered as a matter of routine in the course of operation. 

8 If necessary, the execution of a chain of tasks can be 

9 interrupted. However, variable reactions to the faults and 

10 problems, which go beyond interruption, are also possible on 

11 the basis of the available rule and performance data. 
12 

13 One development of the invention provides for at least some of 

14 the distributed hardware and/or software systems to be 

15 assigned their own autonomous program means which are stored 

16 in data processing devices and are in the form of autonomous 

17 agents which are subordinate to the central program means. In 

18 this case, the autonomous program means or agents at the 

19 system level carry out administrative and monitoring tasks but 

20 they are subordinate to the central program means so that it 

21 is possible to avoid collisions in decisions which affect a 

22 plurality of systems in the system environment. 
23 

24 Provision is also made for the autonomous agent of an 

25 individual hardware and/or software system to access rule data 

26 which are prescribed at the system level in the data 

27 processing devices and comprise, in particular, rules for the 

28 individual system and/or the interaction with the central 

29 autonomous program means. Depending on the stipulation of 

30 these rules, the autonomous agent makes decisions for his 

31 respective system on the basis of the rules insofar as said 

32 decisions do not fall within the regulating sphere of the 

33 central autonomous program means. If the autonomous agent 

34 cooperates with the central autonomous program means, this 

35 cooperation is again subject to rules so that, for example, 

36 both do not make operation-related decisions, which differ 
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1 from one another under certain circumstances, for the same 

2 area of the system. 
3 

4 The central program means and the autonomous agents of the 

5 individual hardware and/or software systems expediently 

6 interchange control and/or rule data via the communications 

7 networks. In this manner, the central program means receives 

8 information regarding control processes which have been 

9 carried out at the system level, for example the movement of a 

10 service, and may coordinate the central management and 

11 administration therewith. Conversely, the autonomous agent at 

12 the system level requires information regarding the operations 

13 in which the central program means has intervened in the 

14 system in order to avoid collisions or to prevent individual 

15 tasks from being processed twice. 
16 

17 It is advantageous if the central program means grants 

18 decision-making powers to the autonomous agents of the 

19 individual systems, and/or withdraws said decision-making 

20 powers, in a permanent or temporally restricted and/or dynamic 

21 manner using the communications networks. Such dynamic 

22 authorization makes it possible to react to changes in the 

23 system environment in a flexible manner. In the event of a 

24 fault, it is expedient, for example, for the central program 

25 means to be granted greater decision-making powers in order to 

26 first restore basic operation. In contrast, in the case of 

27 trouble-free operation, the decision-making powers of the 

28 autonomous agents can be increased if no problems are to be 

29 expected. 
30 

31 The invention provides for the autonomous agents of the 

32 individual hardware and/or software systems to respectively 

33 transmit general and/or system-specific control data to the 

34 data processing device of the central program means via a 

35 communications network and/or to publish said data in 

36 generally accessible file systems and/or to collaborate in the 

37 separation of administrative tasks and/or chains of tasks into 
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1 subtasks. The term publication means that data which are of 

2 interest beyond individual system levels are made available to 

3 the central program means or else to other subsystems using a 

4 generally accessible file system (blackboard) . Separating the 

5 tasks at the individual system level eases the burden on the 

6 central program means and dividing the tasks into subtasks at 

7 the individual system level is also more expedient in specific 

8 systems. 
9 

10 One development of the invention provides for the central 

11 program means to be operated in different operating modes, in 

12 particular in a fully autonomous or partially autonomous 

13 manner and/or with different reaction speeds. These different 

14 operating modes can be selected depending on the current 

15 operating conditions. Simple standard operation can be carried 

16 out in a fully autonomous manner but partially autonomous 

17 operation will generally be expedient in the event of faults. 

18 The speed at which the means react to a given situation needs 

19 to be orientated to all of the operations which take place in 

20 the system environment. In the individual case, a slow 

21 reaction may be expedient in order to conclude a particular 

22 operation before the reaction. In the case of relatively great 

23 problems, it is often necessary to react quickly in order to 

24 prevent a chain of resultant problems. 
25 

26 Provision is expediently made for the operation of the central 

27 program means in the partially autonomous mode to be changed 

28 and/or interrupted by manual inputs on an input device by an 

29 authorized administrator. This ensures that, in the case of 

30 rare problems or faults or else special operating requirements 

31 for which there are no rules under certain circumstances, 

32 operation can still be controlled manually. 
33 

34 In addition, it may be expedient for the operation of the 

35 central program means in the partially autonomous mode to be 

36 changed and/or interrupted by the autonomous agents of the 

37 individual systems. Such a restriction of the autonomous 
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1 operation of the central program means is expedient when the 

2 autonomous agents at the individual system level are working 

3 on their system in a comparatively independent manner without 

4 interchanging a relatively large amount of data with the 

5 central program means, with the result that, in the event of a 

6 fault, the central program means may be lacking information 

7 which the autonomous agent has and which renders it necessary 

8 to change the central operation. The autonomous agent can then 

9 arrange for this change to be made. 
10 

11 It is advantageous if the central program means comprises a 

12 notification component which uses an output device to output 

13 information regarding substeps of the work of the central 

14 program means and/or the processing state thereof. An 

15 administrator or operator thus receives information regarding 

16 the progression of system operation and accordingly knows, for 

17 example, when tasks whose results he requires will be 

18 concluded. In addition, the administrator can coordinate any • 

19 possible planned manual interventions with the given 

20 processing state. Malfunctions can be quickly detected. 
21 

22 One refinement provides for the distributed hardware and/or 

23 software systems to comprise at least one application system. 

24 The at least one application system may comprise a plurality 

25 of entities which each control at least one service, in 

26 particular interactive mode and/or batch mode and/or 

27 accounting and/or printing and/or messaging and/or network 

28 services. Messaging services make it possible to communicate 

29 and interchange notifications, while network services are 

30 responsible, on the one hand, for internal networks and, on 

31 the other hand, for the connection to principally external 

32 networks such as the Internet, for example in the form of web 

33 services. The different entities of an application form a 

34 logical system with corresponding relationships. 
35 

36 Provision is also made for a plurality of application systems 

37 to cooperate in a system family. This constellation is typical 
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1 of relatively large configurations/ in which a number of 

2 relationships can again exist between the individual systems 

3 if, for example, application systems are placed on one another 

4 or condition one another. 
5 

6 In addition, it is possible for at least one application 

7 system to be operated in a virtual environment without fixed 

8 hardware assignment. The use of the method according to the 

9 invention using the central autonomous program means is 

10 particularly advantageous, in particular, in such a case if 

11 the assignment between the application and the hardware varies 

12 and cannot be readily identified from the outside since 

13 conventional management and administration methods provide 

14 only insufficient and complicated solutions in this case. 
15 

16 Provision is also made for the distributed hardware and/or 

17 software systems to comprise client /server systems and/or 

18 operating systems. Client /server systems are of central 

19 importance in modern computer environments. This applies, in 

20 particular, in "adaptive computing". The corresponding 

21 operating systems form the connection to the application 

22 systems. 
23 

24 In addition, the invention relates to a system for managing 

25 and monitoring the operation of a plurality of distributed 

26 hardware and/or software systems that are integrated into at 

27 least one communications network, said system comprising a 

28 data processing device and a central autonomous program means 

29 that is stored in the latter and/or autonomous agents (which 

30 are stored in data processing devices) for individual hardware 

31 and/or software systems and/or input and/or output devices at 

32 the central and/or individual system level and being designed 

33 to carry out the method as described above. 
34 

35 Further advantages, features and details of the invention will 

36 be described below with reference to a particularly suitable 

37 exemplary embodiment. 



PCT/EP2 005/05088 9 
2004P02085WOUS 



-12- 



1 

2 The figure shows a schematic diagram for carrying out the 

3 method according to the invention. 
4 

5 The central program means is stored in a data processing 

6 device which is not illustrated here. There is a connection to 

7 an input/output device. In this case, an operator or 

8 administrator can effect inputs, for example in order to 

9 change or interrupt the operation of a central program means 

10 that is operating in the partially autonomous mode, or can 

11 follow up the notifications from the central program means 

12 regarding the substeps of the work and the processing state of 

13 the latter. Two system families x and y which comprise, for 

14 example, cooperating applications are subordinate to the 

15 central program means. Each of the two system families 

16 comprises two subsystems, the systems A and D and B and C. 
17 

18 The central program means and the individual systems are each 

19 mutually related to the blackboards (generally accessible file 

20 systems) . The individual systems publish, if appropriate, 

21 general and/or system-specific control data, which are not 

22 only intended to be accessible to the central program means 

23 but also to further individual systems, on the blackboards 

24 using their autonomous agents, in particular. This is 

25 interesting when the data can affect other systems, for 

26 example when applications mutually depend on one another. The 

27 individual systems, for their part, provide the central 

28 program means with control and rule data via communications 

29 networks. In addition, they collaborate in the separation of 

30 administrative tasks or chains of tasks into subtasks. 
31 

32 The systems A - D are responsible for different services a - 

33 1. These services may comprise, for example, interactive or 

34 batch processing, accounting, printing, messaging and web 

35 services. The systems are operated in a distributed manner, 

36 with the result that the services associated with a system are 

37 respectively implemented in different autonomous individual 
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1 systems. In the case illustrated, these individual systems are 

2 autonomous hardware systems 1-5 which are composed of 

3 heterogeneous hardware components- Each system is provided 

4 with individual hardware and an operating system (not 

5 illustrated here) . The services a and d of the system A run on 

6 the autonomous individual system 1 and the service d is 

7 simultaneously also operated in the individual system 3, while 

8 a further service e of the system A is located in the 

9 individual system 4. This assignment of the services of the 

10 systems A - D to the individual systems 1-5 varies 

11 dynamically depending on the current requirements of the 

12 overall system environment. There is no fixed assignment 

13 between the application and the hardware resources. For 

14 example, the service j , which belongs to the application 

15 system D and is initially running on the autonomous individual 

16 system 3, is changed over to operation in the autonomous 

17 individual system 5. 
18 

19 The autonomous agents of the individual systems and the 

20 central program means collect and process information 

21 regarding operation taking into account the changing 

22 assignments and derive autonomous decisions from said 

23 information. Since the individual systems A - D, for their 

24 part, have the autonomous powers (not illustrated here) , the 

25 amount of information that needs to be interchanged overall in 

26 the system environment is reduced and a multiplicity of 

27 reaction possibilities which can each be attributed to simple 

28 reactions are produced. The central program means can be 

29 operated in a fully autonomous or partially autonomous manner. 

30 In the partially autonomous mode, the operation of the central 

31 program means can be changed or interrupted by inputs by an 

32 administrator on the input/output device or by the autonomous 

33 agents of the individual systems. Since there is no fixed 

34 assignment between the hardware and software, it is possible 

35 to utilize and make full use of the hardware resources in an 

36 optimum manner. As illustrated here, the same services may run 

37 on different autonomous individual systems. For example, the 
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1 service e can be operated in the individual systems 2, 4 and 

2 5. If one of these systems is particularly burdened, the 

3 application system which is responsible for this service, for 

4 example, can alternatively allow the service to run on another 

5 hardware system. The central program means also enables 

6 effective management and effective monitoring and 

7 administration in such a case of "adaptive computing" having 

8 virtual environments. 
9 

10 



